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ABSTRACT 


In most contexts of student skills assessment, whether the 
test material is administered by the teacher or within a 
learning environment, there is a strong incentive to mini- 
mize the number of questions or exercises administered in 
order to get an accurate assessment. This minimization ob- 
jective can be framed as a Q-matrix design problem: given 
a set of skills to assess and a fixed number of question items, 
determine the optimal set of items, out of a potentially large 
pool, that will yield the most accurate assessment. In recent 
years, the Q-matrix identifiability under DINA/DINO mod- 
els has been proposed as a guiding principle for that purpose. 
We empirically investigate the extent to which identifiability 
can serve that purpose. Identifiability of Q-matrices is stud- 
ied throughout a range of conditions in an effort to measure 
and understand its relation to student skills assessment. The 
investigation relies on simulation studies of skills assessment 
with synthetic data. Results show that identifiability is an 
important factor that determines the capacity of a Q-matrix 
to lead to accurate skills assessment with the least number 
of questions. 


1. INTRODUCTION 


Consider a set of items intended to assess a student’s mas- 
tery over a set of skills, or knowledge components (KC). 
These items, along with the set of skills, can be designed 
to test a single skill at once. Or, they can be designed to 
involve two or more skills. A test composed of a fixed num- 
ber of items can either be composed of a mixture of single 
and multiple skills items, or composed of one type of items 
only. Skills can themselves be defined so as to facilitate the 
creation of task/problem items that involve single skill per 
item, or multiple skills per items. By which principles should 
a teacher choose among these different options? 


This paper addresses this question, with the general objec- 
tive of designing a test that will bring the most accurate 
assessment of a student’s skill mastery state with the least 
number of questions items. 


The investigation is framed within the DINA model, which 
was a widely researched model and originally proposed in 
the research of a rule space method for obtaining diagnostic 
scores (Tatsuoka, 1983). In this model, question items can 
involve one or more skills, and all skills are required in or- 
der to succeed the question, while a success can still occur 
through a guessing factor, and failure can also occur through 
a slip factor. 
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2. Q-MATRIX, DINA MODEL AND 
IDENTIFIABILITY 


The mapping of items to skills is referred to as a Q-matrix, 
where items are mapped to latent skills whose mastery is 
deemed necessary in order for the student to succeed at the 
items. An item can represent a question, an exercise, or 
any task that can have a positive or negative outcome. In 
the DINA model, the conjunctive version of the Q-matrix is 
adopted: all skills are considered necessary for success. 


In the last decade, a number of papers have been devoted to 
deriving a Q-matrix from student test results data (Barnes, 
2010; Liu, Xu, & Ying, 2012; Desmarais, Xu, & Beheshti, 
2015; P. Xu & Desmarais, 2016). Another line of research 
on Q-matrices has been devoted to refine or to validate an 
expert-given Q-matrix (de la Torre & Chiu, 2015; Chiu, 
2013; Desmarais & Naceur, 2013). While the problems of 
deriving or refining a Q-matrix from data are related to Q- 
matrix design, they do not provide insight into how best to 
design them. 


In parallel to these investigations, some researchers have 
looked at the question of the identifiability. The general 
idea behind identifiability is that two or more configurations 
of model parameters can be considered as equivalent. Sets 
of parameters will be considered equivalent if, for example, 
their likelihood is equal given a data sample. Or, conversely, 
if the parameters are part of a generative model, two sets of 
equivalent parameters would generate data having the same 
characteristics of interest, in particular equal joint probabil- 
ity distributions (see Doroudi & Brunskill, 2017, for more 
details). 


The issue of identifiability for student skills assessment 
was first researched in multiple diagnosis model compar- 
ison (Yan, Almond, & Mislevy, 2004), Bayesian Knowl- 
edge Tracing (Beck & Chang, 2007) and later discussed by 
more researchers (van De Sande, 2013; Doroudi & Brun- 
skill, 2017). A mathematically rigorous treatment Q-matrix 
identifiability under the DINA/DINO setting was presented 
under zero slip and guess parameters (Chiu, Douglas, & Li, 
2009), and under known slip and guess (Liu, Xu, & Ying, 
2013), and finally under unknown slip and guess parame- 
ters (Chen, Liu, Xu, & Ying, 2015). An overall discussion 
can also be found (G. Xu & Zhang, 2015; Qin et al., 2015). 
These studies provide theoretical basis to derive Q-matrices 
from data, but not to the design of Q-matrices itself. In this 
paper, we consider the identifiability of the Q-matrix with 
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regards to the DINA model. 


Identifiability is a general concept for statistical models. Its 
formal definition is: 


Definition (1) (Casella & Berger, 2002) A parameter 6 for a 
family of distribution f(x|@ : 6 € ©) is identifiable if distinct 
values of 6 correspond to distinct pdfs or pmfs. That is, if 
646’, then f(z|@) is not the same function of x as f(2|6’). 


The DINA model has parameters 6 = {Q,p, s,g}, where Q 
is the Q-matrix. p is the categorical distribution parameter 
for all student profile categories. That is, it indicates the 
probability that a student belongs to each profile category. 
For example, in a 3-skill case, there are 2? = 8 categories 
for students to belong to, and the 8-component probabil- 
ity vector of students belongs to each of these categories is 
the model parameter p. Finally, s and g are both vectors 
denoting the slip and guess of each item. 


The identifiability of all parameters in DINA model have 
been thoroughly investigated and several theorems are given 
(G. Xu & Zhang, 2015). But for the Q-matrix design prob- 
lem that is the focus of this paper, we solely need to ensure 
that the model parameter p is identifiable, meaning that we 
can distinguish different profile categories. Fortunately, for 
the case when s and g are known, the requirement is easily 
satisfied, since it only requires the Q-matrix to be complete. 


Definition (2) (Chen et al., 2015) The matrix Q is complete 
if {e; :4 =1,...,K} C Ra, where K is the number of skills 
(columns of Q), Rg is the set of row vectors of Q, and e; is 
a row vector such that the i-th element is one and the rest 
are zero (i.e. a binary unit vector, also known as a “one-hot 
vector”). Stated differently, the rows of the identity matrix, 
Ikx«, must be in Q for this matrix to be complete. 


And the heart of the current investigation is based on the 
following proposition: 


Proposition (Chen et al., 2015) Under the DINA and 
DINO models, with Q, s and g being known, the popula- 
tion proportional parameter p is identifiable if and only if Q 
is complete. 


We show an example of Q-matrix that is not complete below 
for better illustration. 


ky ko kg 
al 1 O O 
q2 0 1 i1 
q3 1 O 1 


This Q-matrix does not contain e2 : [0,1,0] or es : [0,0, 1], 
and is therefore not complete, even though its items (rows) 
cover all skills (columns). Using this Q-matrix under DINA 
model setting entails that the model parameters are not 
identifiable according to the proposition above, and would 
in turn compromise student profile diagnosis. In fact, stu- 
dents who only master skill 2 and students who only master 
skill 3 are indistinguishable under this Q-matrix. 


But while the use of a non identifiable Q-matrix should be 
avoided according to the proposition, the question remains: 


among all the complete Q-matrix, which ones are most effi- 
cient for student profile diagnosis? 


In the next section, we investigate empirically the Q-matrix 
design options in light of the completeness requirement, 
using synthetic student performance data with the DINA 
model. Synthetic data is essential for this investigation be- 
cause we need to know the underlying ground truth. We 
return to the issue of using real data in the conclusion. 


3. EXPERIMENT 


The Q-matrix design problem is essentially an optimization 
problem. Basically, we have a pool of Q-matrices, and each 
of them is formed by a selection with replacement from a 
pool of q-vectors. Each Q-matrix will yield some capacity 
to diagnose students, as measured by a loss function. We 
aim to choose a Q-matrix that minimizes the loss function. 


Our experiments follow a Bayesian framework to diagnose 
students under DINA Q-matrices. First, we use one-hot 
encoding to denote all profile categories. Set M to be the 
number of profile categories. Then, in the 3-skill case, the 
M =8 profile categories pc; are: 
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Therefore, a student belonging to profile pc; is encoded as 
a binary unit vector ai = (1,0,0,0,0,0,0,0), and so on for 
pc2 encoded as a2 = (0,1,0,0,0,0, 0,0), ..., and pcg encoded 
as ag = (0,0,0,0,0,0,0,1). The DINA model parameter p 
is represented as a probability vector p = (p1,p2,...,ps) = 
(P(a1), P(az),...,P(ag)). Then, we set the prior of each 
student profile to be: 


ao = (1/8, 1/8, 1/8, 1/8, 1/8, 1/8, 1/8, 1/8) 


With the conditional independence assumed (i.e, condi- 
tioned on a given profile category, the probability to answer 
each question correct is independent), the likelihood is given 
by (De La Torre, 2009; Chen et al., 2015): 


L(p, Q, 8, 9|X) = P(X|p, Q, s, 9) 


I 
= []>. p-P(Xila, Q, , 9) 


is J 
= [[ dor. II P;(a)*# [1 — P; (a) ]})-* 


(1) 


in which X is the response matrix and X; is the i-th row, 
I is the number of records (students), J is the number of 
questions. P;(a) is the probability of student profile a to 
answer correctly of question j, notice a in 3-skill case has 
only 8 possible values, for any of them am,m = 1,...,8, the 
probability is given by DINA model 


P;(Qm) 


9; (1—83)"™ 


P(Xij = 1am) 
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in which 7m; is the latent response of profile am to question 
j, that is, the response when slip and guess is 0. It can be 
calculated by 


K 
= | | jk 
Imj = Omnk 
k=1 


where K is the number of skills and qjx is the (Jj, k)-th ele- 
ment of Q-matrix Q. 


Given the prior and likelihood, the posterior @ for each stu- 
dent can be calculated. It has the form: 


& = (P1, p2, D3, Pa, Bs, Po, Pr, Bs) 
and we then calculate the loss between this posterior and 


the true profile Qtrue, which is one of the one-hot encoding 
vector. 


For any Q-matrix configuration, the loss function is defined 
by 


2 
Ctrue | 


loss(Q)= > lla — 


2€students 


To implement the experiment, for each Q-matrix configu- 
ration, we generate a response matrix based on the DINA 
model given fixed slip and guess parameters, using function 
*DINAsim’ from the R package DINA (Culpepper, 2015). 
Then, we calculate the posterior estimation for all students 
and evaluate the total loss. The reported result is an average 
loss of 100 runs. 


In our experiments, we consider the 3-skills and 4-skills 
cases. For the 3-skills case, experiments are conducted with 
N = 200 students, of which 25 students fall into each of 
8 categories. For the 4-skills case, we use N = 400 students, 
of which 25 students fall into each of 16 categories. 


3.1 Experiment 1: Comparison of three 


strategies 
In the first experiment, we compare three different Q-matrix 
design strategies. They are all based on repetition of a spe- 
cific pool of q-vectors. 


e Strategy 1 (Q-matrix 1): Using the identifiability con- 
dition (definition (1)) by using only combinations of 
the vectors {e; : i = 1,..., AK} (binary unit vectors, or 
one-hot encodings). 


e Strategy 2 (Q-matrix 2): Using the vectors {e; : i = 
1,..., K} plus an all-one vector (1,1, 1) (in 3-skill case) 
or (1,1,1,1) (in 4-skill case). This is inspired by or- 
thogonal array design, which is a commonly seen de- 
sign of experiments (Montgomery, 2017). 


e Strategy 3 (Q-matrix 3): Repeatedly using all q- 
vectors. 


For the 3-skills case, all these three Q-matrices are shown in 
Figure 1. The general pattern is to recycle the rows above 
the lines denoted by ...[..., ..., ...]. 


The 4-skills case is similar, which is omitted here. Results 
of these two cases are shown in Figure 2a and Figure 2b. 


Q-matrix 1 
(binary unit vectors) 


k 


> 
me 
iw} 
x 
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Q-matrix 3 
(all combinations) 


a 1 0 O 
q2 0 1 O 
0 0 1 


q3 


ky kp kg 
q19 1 0 O a1 1 0 0 
420 0 1 O q2 O- 2.0 
q21 0 0 1 13 0 0 1 
qa 1 1 0O 
a5 1 oO 1 
Q-matrix 2 46 0 1 1 
(binary unit + all-1s vectors) 47 ly Ay “1 
ky kg k3 ie 1 0 0 
un 1 0 0 
116 0 1 0 
42 0 1 O 
q17 0 0 1 
43 0 0 1 
: oe a18 1 1 0 
’ 7 q19 1 oO 1 
fe cs 60 men ta 
as | 0 1 0 a 
q19 0 0 1 
420 1 1 iit 
q21 1 0 O 


Figure 1: Q-matrix design strategies 


3.2 Experiment 2: Find best configuration 
The second experiment takes the brute force approach. We 
directly examine all possible Q-matrix configurations. First, 
for a given pool of q-vectors to choose from and an integer 
indicating the number of questions, we need to know the 
number of possible configurations of Q-matrices we have. 
This is equivalent to a classical combinatorial problem, that 
is, to allocate marbles (q-vectors) to bins (questions). It can 
be easily computed by combinatorial coefficients and inter- 
preted by using stars and bars methods. For example, in 
3-skills case, we have 7 q-vectors, and if we have 4 ques- 
tions to allocate them, then we have Ca = 210 possible 
configurations. This number grows up sharply as a number 
of questions increases or number of patterns increases. As 
a comparison, in the 4-skills case, if we have 5 questions 
to allocate them, then we have Caer) = 11628 possible 
configurations. 


For each configuration, we calculate the MAP estimation for 
all categories of each student, and compare with the one-hot 
encoding for their true categories. The total loss is reported 
as the performance index. 


Figure 3 shows the results of 6 combinations of different 
numbers of skills and questions: 


e 3-skills case, 4 questions: Figure 3a, Figure 3b 
e 3-skills case, 8 questions: Figure 3c, Figure 3d 


e 4-skills case, 5 questions: Figure 3e, Figure 3f 
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(b) 4-skills case 


Figure 2: Experiment 1: Three Strategy Comparison on 3- and 4-skills cases 


4. DISCUSSION 


From the result of experiment 1 we can see that strategy 1 
always works better than the other two strategies, mean- 
ing that simply repeating the vectors {e; : i = 1,...,K} in 
Q-matrix design, without using any combination of skills, 
yields better student diagnosis performance. 


From the result of experiment 2, when slip and guess pa- 
rameters are as low as 0.01, we can see obvious graded pat- 
terns among different configurations. This can be explained 
by the the distinguishability of a Q-matrix. For example, 
in Figure 3a, we can see there are 7 layers. In fact, the 
first layer consisted of Q-matrix that can only cluster stu- 
dents into 2 categories. One example of such a Q-matrix is 


ky ko kg 
a1 1 0O 0O 
a 1 0 O 
a 1 0O 0O 
q 1 0 O 


This Q-matrix can only discriminate between a student that 
mastered skill 1 or not. We know that there are in fact 8 
categories of students, the 7 layers in Figure 3a from top 
to bottom correspond to the Q-matrix that can separate 
students into 2 to 8 categories. We can see that complete 
Q-matrices always fall in the bottom layer, which concurs 
with the proposition of Section 2. The 4-skills case is similar 
in Figure 3e. 


When slip and guess parameter increase, the points become 
more divergent, as can be seen by comparison between fig- 
ures 3a and 3b. In order to see some greater details, we 
distinguish three types of Q-matrices. 


e Type I: Complete and confined, meaning it is only con- 
sisted of vectors {e; :i = 1,..., K}. 


e Type II: Complete but not confined, meaning it not 
only contains all vectors {e; : 7 = 1,..., A}, but also 


contains at least one other q-vector. 


e Type III: Incomplete Q-matrix. 


Type I and Type II Q-matrices performs the same when slip 
and guess are low (figures 3a, 3e), but when they get higher, 
Type I Q-matrices show a better performance (figures 3b, 
3f). 


However, when more questions are involved in a high slip and 
guess condition, the performance becomes more unstable. 
Therefore, we again consider more subtypes. In 3-skills case 
for 8 questions, we consider three subtypes below. 


e Subtype 1: Q-matrix contains each component of {e; : 
i=1,..., A} at least twice. 


e Subtype 2: Other situations (e.g A complete Q-matrix 
but all the other vectors are just repeated e1). 


e Subtype 3: Q-matrix contains all q-vectors. 


From Figure 3d we can see that the subtype 1 (denoted by 
triangle) shows better performance than subtype 2, meaning 
that repeating the whole set of {e; :1 = 1,..., K} is a better 
strategy just like the strategy 1 we used in experiment 1. 
Subtype 3 corresponds to the strategy 3 in experiment 1, it 
has only 7 possible configurations in 8-question setting and 
we can see that they do not perform well. 


Therefore, we argue that the best Q-matrix design is to use 
only the vectors {e; : 1 = 1,..., A} since it offers quicker 
convergence speed (as shown in experiment 1) and better 
robustness against slip and guess (as shown both in experi- 
ments 1 and 2). 


5. CONCLUSION 


This work is still in an early stage and has limitations, in 
particular because it is conducted with synthetic data. But 


Proceedings of the 11th International Conference on Educational Data Mining 441 


loss 


loss 


loss 


120 Gw® come ew © @om@e & eo @ 
SHUT GUEND 00 MIERTDO O@ 0 Pats type 
e | 
80 - So Ge Gea alee ce we oh 
e ean 0 comme ell 
40 - 
e ee 
@e o 
0 50 100 150 200 
configuration 
(a) 3-skills case, slip=guess=0.01, J=4 
150- @e e e e 
type 
QFeaPe o¢ maw @o e yp 
e | 
100 - GPraeie comemaiRra? «care egal e il 
e ill 
CmDatene cote 
50- werGitiamys tage Getenh empaew? subtype 
a 1 
Ww wer GP oe Meee EE 
04 Ww we wee 
0 1000 2000 3000 
configuration 
(c) 3-skills case, slip=guess=0.01, J=8 
q@eee eo @ @ e t ) e 
Gree @ cme ahoe aD © a ce ae 
300 - Gieeeeeteecetascemse ogee 
iain SEED 
Cisuimugerr oer 
nretinnenammontnnnmenmmmn ‘ype 
_ tiie item e| 
200-  @@60 ane cane Gumneuame 
© Be ot: abe kan HEED ee |! 
©f@ @ 88 C8 aD e ill 
e em o em @ 
100- e ee 
e ee 
@e 
0 3000 6000 9000 12000 
configuration 


(e) 4-skills case, slip=guess=0.01, J=5 
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Figure 3: Experiment 2: Configurations of different slip and guess parameters and number of skills, J. 
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the main finding is wide reaching and warrants further in- 
vestigations. The support for designing Q-matrices that sat- 
isfy the identifiability condition by single-skill items is com- 
pelling in the experiments conducted with synthetic data. 
The results clearly show such matrices yield more accurate 
student skills assessment. In particular, they show that Q- 
matrices that contains items that span the whole range of 
potential combinations of skills tend to yield lower skills as- 
sessment than Q-matrices that simply repeat the pattern of 
single-skill items. 


The finding that tests composed of single-skill items are bet- 
ter for skills assessment is somewhat counter-intuitive, as 
intuition suggests that a good test should also include items 
with combinations of skills. But intuition also suggests that 
items that involve combination of skills are more difficult, 
and it may not simply be because they involve more than one 
skill. It might be that solving items that combine different 
skills in a single problem is a new skill in itself. This conjec- 
ture is in fact probably familiar to a majority of educators, 
and the current work provides formal evidence to support 
it. And the immediate consequence is that Q-matrices, as 
we currently conceive them, fail to reflect that a task that 
combines skill involves a new skill. 


Ideally, future work should be conducted with real data. 
However, given that we do not know the real Q-matrix that 
underlies real data, investigating the questions raised by 
the current study is non trivial. Meanwhile, further experi- 
ments with synthetic data can be considered with different 
choices on student profiles distribution, and different num- 
ber of skills involved. Besides, the case where slip and guess 
are unknown should also be considered, which involves a 
different identifiability requirement (G. Xu & Zhang, 2015). 
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